Library

library(rvest)
library(tidyverse)
library(htm2txt) # convert html to txt. 
library(reshape2)
library(gt)

10-K: https://www.sec.gov/files/form10-k.pdf

10-Q: https://www.sec.gov/files/form10-q.pdf

1. Parameters

Here we use the filing from APPLE:

# 1. parameters ---- 
filing_qrt <- "QTR2" # the filing quarter 
loc_file <- "Edgar filings_full text/Form 10-Q/320193/320193_10-Q_2015-07-22_0001193125-15-259935.txt"

# 2. import filing -----
filing <- readLines(loc_file) # read txt filing 

2. Function to Extract Filing Info

Function filing.header extracts the filing information from the header.

filing.header <- function(x, # the file 
                          regex_header = 'ACCESSION NUMBER:|</SEC-HEADER>' # the regex of the start to end of the header section in the filing
) { # parse filing header info 
  header <- grep(pattern = regex_header, x = filing, perl = T)
  header_cleaned <- str_squish(x[header[1]:(header[2]-1)]) 
  header_info <- str_split_fixed(header_cleaned[header_cleaned != ""], 
                           pattern = ":\\s", 2)
  return(header_info)
}

The extracted filing info looks like this:

filing_header <- filing.header(x = filing)
  filing_type <- filing_header[2,2] # get the filing type (10-Q/K)  
  filing_cik <- filing_header[10,2] # get cik 

gt(data = as.data.frame(filing_header)) %>%
  cols_label(V1 = "Label", V2 = "Info")
Label Info
ACCESSION NUMBER 0001193125-15-259935
CONFORMED SUBMISSION TYPE 10-Q
PUBLIC DOCUMENT COUNT 11
CONFORMED PERIOD OF REPORT 20150627
FILED AS OF DATE 20150722
DATE AS OF CHANGE 20150722
FILER:
COMPANY DATA:
COMPANY CONFORMED NAME APPLE INC
CENTRAL INDEX KEY 0000320193
STANDARD INDUSTRIAL CLASSIFICATION ELECTRONIC COMPUTERS [3571]
IRS NUMBER 942404110
STATE OF INCORPORATION CA
FISCAL YEAR END 0927
FILING VALUES:
FORM TYPE 10-Q
SEC ACT 1934 Act
SEC FILE NUMBER 001-36743
FILM NUMBER 151000501
BUSINESS ADDRESS:
STREET 1 ONE INFINITE LOOP
CITY CUPERTINO
STATE CA
ZIP 95014
BUSINESS PHONE (408) 996-1010
MAIL ADDRESS:
STREET 1 ONE INFINITE LOOP
CITY CUPERTINO
STATE CA
ZIP 95014
FORMER COMPANY:
FORMER CONFORMED NAME APPLE COMPUTER INC
DATE OF NAME CHANGE 19970808

3. Function to Extract Table of Content(s) and Locate Item of Interest

Function filing.toc extracts the Table of Content(s) (toc) from the filing.

Function loc.item locates the item of interest, which potentially contains share repurchase information, in the Table of Content(s).

filing.toc <- function(x, # filing 
                       regex_toc = '<text>|</text>' # locate ToC
){ # find the table of content(s)
  toc <- grep(pattern = regex_toc, x = x, ignore.case = T)[1:2] # the part containing the ToC
  filing_toc <- read_html(paste0(x[toc[1]:toc[2]], collapse = "")) # extract the toc
  return(filing_toc)
} 

loc.item <- function(x, # filing 
                     filing_type, # filing type from the previous input
                     regex_item = c("Unregistered Sales of Equity Securities and Use of Proceeds",
                                    "Market for Registrant’s Common Equity, Related Stockholder Matters and Issuer Purchases of Equity Securities") 
) { # locate the section of the item of interest 
    # > item 2 in 10-Q: "Unregistered Sales of Equity Securities and Use of Proceeds" ;
    # > item 5 in 10-K: "Market for Registrant’s Common Equity, Related Stockholder Matters and Issuer Purchases of Equity Securities" ;
  toc <- filing.toc(x = filing)
  
  regex <- regex_item[filing_type == c("10-Q", "10-K")] # identify the regex 
  toc_txt <- html_nodes(html_nodes(toc, "table"), "a") 
  
  item_id <- gsub(x = unique(html_attr(toc_txt[which(grepl(pattern = regex,
                                        x = html_text(toc_txt), 
                                        ignore.case = T)) + 0:6],"href"))[1:2],
                  pattern = '#', replacement = '')
  loc_item <- vapply(X = item_id,
                     FUN = function(p) {
                        loc_item0 <- grep(pattern = p, x = x, fixed = T)
                        return(ifelse(length(loc_item0) != 1, loc_item0[2], loc_item0[1]))
                      },
                     FUN.VALUE = numeric(1))

   return(list(loc_item = loc_item, item_id = item_id))
}

From function loc.item, it will automatically locate Item 2 in 10-Q and Item 5 in 10-K based on the filing type. By running the function on variable filing, I can store the toc into variable filing_title and

filing_title <- filing.toc(x = filing)
filing_title
## {html_document}
## <html>
## [1] <body><text><title>Form 10-Q</title>\n<h5 align="left"><a href="#toc">Tab ...
loc_item2 <- loc.item(x = filing, filing_type = filing_type) # the location for the item 2/5 
cat("The item of interest is from line", loc_item2$loc_item[1], "to", loc_item2$loc_item[2], ".")
## The item of interest is from line 13139 to 13405 .

4. Function to Extract Info from the Item

Function filing.item extracts the information in the Item 2(5) in 10-Q(K). x contains the text for the filing. loc_item locates the position of the Item 2(5) from function loc.item. filing_qrt is the filing quarter and is defined outside the function.

Function filing.item_txt extract the information before and after the table in the Item 2(5) and record in $header and $footnote.

## extract the txt header and/or footnote from the item
filing.item_txt <- function(item_txt, # txt in character for the item
                                item_tbl_id, # the location of the table in the item `item_txt`
                                parts = c("header", "footnote") # kept parts
    ) {
      if (length(parts == c("header", "footnote")) != 0) {
        parts_id <- c(1, grep('<(/?)table', item_txt, ignore.case = T)[1 + 2*item_tbl_id + (-1:0)], length(item_txt))
        loc_item_section <- list(header = parts_id[1:2],
                                 footnote = parts_id[3:4])
        
        filing_item2_txt <- lapply(X = loc_item_section[parts],
                                   FUN = function(id) {html_text(read_html(paste(item_txt[id[1]:id[2]], collapse = ""),
                                   options = 'HUGE'), trim = T)} )
        return(filing_item2_txt)
      } else {
        return(list( header = NULL, footnote = NULL ))
      }
    }


## extract section/item txt 
filing.item <- function(x, # filing
                        loc_item, # the location of the item of interest
                        item_id, # the identifier from 'href' for the section 
                        filing_qrt, # the quarter the filing was made 
                        table = TRUE, # whether to scrap the table numbers 
                        parts = c("footnote") # the parts of information that you want 
) { # extract info from the section/item 
  if (loc_item[1] == loc_item[2]) {
    item_parse <- str_split_fixed(string = x[loc_item[1]:loc_item[2]],
                                  pattern = item_id[1], n = Inf) %>% .[1, ncol(.)]
    item_txt <- str_extract(string = item_parse, 
                    pattern = paste0("^(.*?)", item_id[2], collapse = ""))
  } else {
  # the full item 
    item_txt <- x[loc_item[1]:loc_item[2]] 
  }
  # find the table(s) 
  item_html <- read_html(paste0(item_txt, collapse = ""))
  item_tbls <- html_nodes(item_html, "table")
  item_tbl_id <- grep(pattern = "Total", x = item_tbls, fixed = T)[1]  # identify the correct table
  
  ## extract the table 
  if (!is.na(item_tbl_id)) {
    ## 
    if (!(parts %in% c("header", "footnote"))) {
      stop("Messing `parts` variable")
    } else {
      item_htm2txt <- html_text(item_html, trim = T) # pure text document 
      filing_item2_txt <- strsplit(x = item_htm2txt, split = (html_text(item_tbls[[item_tbl_id]], trim = F)), fixed = T)[[1]][match(parts, c("header", "footnote"))]
    }
    
    ### extract the unit information 
    item_table_unit <- c(na.omit((str_extract(string = item_htm2txt,
                                              pattern = str_extract(html_text(item_html), pattern = "\\(([^()]+)\\)")))))
    
    ### <Tables starts here!>
    ### clean the table 
    item_table <- unique.matrix(as.matrix(html_table(item_tbls[[item_tbl_id]])))[-1,]
    tbl_periods_id <- grep(pattern = '(\\w+\\d{1,2},\\s+\\d{4}|Total|total)', item_table[,1]) # id_row for the periods
    tbl_periods <- rep(item_table[tbl_periods_id,1],
                       time = c(diff(tbl_periods_id), 1) 
    ) # return the periods 
    tbl_periods[tbl_periods == "Total"] <- filing_qrt # entering the filing quarter
    
    tbl_title <- c("item", item_table[1,][-1])
    tbl_numbers <- item_table[-(1:(tbl_periods_id[1]-1)),] %>% # remove the first line
      cbind(., "period" =`length<-`(tbl_periods, nrow(.))) %>%  # add 'period' column 
      .[-(tbl_periods_id[which(c(diff(tbl_periods_id), 1) != 1)] - (tbl_periods_id[1]-1)), # clean duplicated rows 
        c(TRUE, duplicated(tbl_title[-1], incomparables = c(NA, "")), TRUE)] # clean duplicated columns
    
    tbl_numbers <- matrix(str_replace(tbl_numbers,
                                      pattern = "\\$|(\\s*?)\\(\\d\\)",
                                      replacement = ""),
                          ncol = ncol(tbl_numbers), 
                          dimnames = list(NULL,
                                          c("item",
                                            tbl_title[duplicated(tbl_title[-1], incomparables = c(NA, ""))],
                                            "period")))
    ### return the cleaned table
    tbl_numbers_cleaned <- melt(as.tibble(tbl_numbers), id.vars = c("item", "period")) 
    
    return(list(table = tbl_numbers_cleaned,
                parts = filing_item2_txt, 
                table_unit = item_table_unit
    ))
    
  } else { # if no table in the item 
    return(list(table = NULL,
                parts = NULL,  
                table_unit = NULL
    ))
  }
}

Function filing.item first extracts the item text and check the table recording share repurchase information, which is recorded in item_tbl_id. If no table is found, which is checked by !is.na(item_tbl_id), this means no repurchase in the reporting quarter. If a credible table is identified, then the function will go in to locate the table ($table), separate the part before the table ($header) and the part after the table ($footnote). Also, the function finds the unit for the numbers in the table ($table_unit).

For a reference, we can see 10-K from APPLE in 2019 and from Twitter 2019 and 2021. While APPLE in 2019 and Twitter in 2021 both have reported share repurchases in their 10-K, Twitter in 2019 did not repurchase and there is no section reporting this info. Same gose for NVIDIA 10-Q in 2021 QRT1.

4.1 Cleaned Repurchase Data

The original filing on EDGAR is here.

From Apple 10-Q in 2015 QRT2, I have this cleaned table for the repurchase information below:

item2_cleaned <- filing.item(x = filing,
                             loc_item = loc_item2$loc_item,
                             item_id = loc_item2$item_id,
                             filing_qrt = filing_qrt)

gt(data = as.data.frame(item2_cleaned$table)) %>%
  tab_footnote(footnote = item2_cleaned$table_unit, 
               locations = cells_column_labels(columns = value))
item period variable value1
Open market and privately negotiated purchases March 29, 2015 to May 2, 2015: Total Numberof SharesPurchased 6,364
May 2015 ASR May 3, 2015 to May 30, 2015: Total Numberof SharesPurchased 38,320
Open market and privately negotiated purchases May 3, 2015 to May 30, 2015: Total Numberof SharesPurchased 20,190
Open market and privately negotiated purchases May 31, 2015 to June 27, 2015: Total Numberof SharesPurchased 4,677
Total QTR2 Total Numberof SharesPurchased 69,551
Open market and privately negotiated purchases March 29, 2015 to May 2, 2015: AveragePrice PaidPer Share 126.49
May 2015 ASR May 3, 2015 to May 30, 2015: AveragePrice PaidPer Share
Open market and privately negotiated purchases May 3, 2015 to May 30, 2015: AveragePrice PaidPer Share 128.53
Open market and privately negotiated purchases May 31, 2015 to June 27, 2015: AveragePrice PaidPer Share 128.28
Total QTR2 AveragePrice PaidPer Share
Open market and privately negotiated purchases March 29, 2015 to May 2, 2015: Total Number ofSharesPurchased asPart of PubliclyAnnouncedPlans orPrograms 6,364
May 2015 ASR May 3, 2015 to May 30, 2015: Total Number ofSharesPurchased asPart of PubliclyAnnouncedPlans orPrograms 38,320
Open market and privately negotiated purchases May 3, 2015 to May 30, 2015: Total Number ofSharesPurchased asPart of PubliclyAnnouncedPlans orPrograms 20,190
Open market and privately negotiated purchases May 31, 2015 to June 27, 2015: Total Number ofSharesPurchased asPart of PubliclyAnnouncedPlans orPrograms 4,677
Total QTR2 Total Number ofSharesPurchased asPart of PubliclyAnnouncedPlans orPrograms
Open market and privately negotiated purchases March 29, 2015 to May 2, 2015: ApproximateDollar Value ofShares ThatMay Yet BePurchasedUnder thePlans orPrograms (1)
May 2015 ASR May 3, 2015 to May 30, 2015: ApproximateDollar Value ofShares ThatMay Yet BePurchasedUnder thePlans orPrograms (1)
Open market and privately negotiated purchases May 3, 2015 to May 30, 2015: ApproximateDollar Value ofShares ThatMay Yet BePurchasedUnder thePlans orPrograms (1)
Open market and privately negotiated purchases May 31, 2015 to June 27, 2015: ApproximateDollar Value ofShares ThatMay Yet BePurchasedUnder thePlans orPrograms (1)
Total QTR2 ApproximateDollar Value ofShares ThatMay Yet BePurchasedUnder thePlans orPrograms (1) 50,050
1 in millions, except number of shares, which are reflected in thousands, and per share amounts
## 
table_var <- as.character(unique(item2_cleaned$table$variable))
item2_cleaned$table_unit
## [1] "in millions, except number of shares, which are reflected in thousands, and per share amounts"

For the unit, it seems that normally, the total number is in millions/thousands, and the value is in million and the price is in dollar.

4.1 Cleaned Item Header/Footnote

In the third line of code, we only retain the $parts with footnote information and discard the text before the table.

item2_cleaned$parts # here is the default footnote
## [1] "  (1)  In 2012, the Company\u0092s Board of Directors authorized a program to repurchase up to $10 billion of the Company\u0092s common stock beginning in 2013. TheCompany\u0092s Board of Directors increased the authorization to repurchase the Company\u0092s common stock to $60 billion in April 2013, to $90 billion in April 2014 and to $140 billion in April 2015. As of June 27, 2015, $90 billion of the$140 billion had been utilized. The remaining $50 billion in the table represents the amount available to repurchase shares under the authorized repurchase program as of June 27, 2015. The Company\u0092s share repurchase program does notobligate it to acquire any specific number of shares. Under the program, shares may be repurchased in privately negotiated and/or open market transactions, including under plans complying with Rule 10b5-1 under the Exchange Act.    (2)  In May 2015, the Company entered into a new accelerated share repurchase arrangement (\u0093ASR\u0094) to purchase up to $6.0 billion of the Company\u0092scommon stock. In exchange for up-front payments totaling $6.0 billion, the financial institutions committed to deliver shares during the ASR\u0092s purchase period, which will end in or before November 2015. The total number of shares ultimatelydelivered, and therefore the average price paid per share, will be determined at the end of the applicable purchase period based on the volume weighted-average price of the Company\u0092s common stock during that period. During the third quarter of2015, 38.3 million net shares were delivered and retired under the May 2015 ASR, and the final number of shares to be delivered will be determined at the conclusion of the purchase period.   Item 3."
filing.item(x = filing, loc_item = loc_item2$loc_item, item_id = loc_item2$item_id, filing_qrt = filing_qrt, parts = "footnote")$parts
## [1] "  (1)  In 2012, the Company\u0092s Board of Directors authorized a program to repurchase up to $10 billion of the Company\u0092s common stock beginning in 2013. TheCompany\u0092s Board of Directors increased the authorization to repurchase the Company\u0092s common stock to $60 billion in April 2013, to $90 billion in April 2014 and to $140 billion in April 2015. As of June 27, 2015, $90 billion of the$140 billion had been utilized. The remaining $50 billion in the table represents the amount available to repurchase shares under the authorized repurchase program as of June 27, 2015. The Company\u0092s share repurchase program does notobligate it to acquire any specific number of shares. Under the program, shares may be repurchased in privately negotiated and/or open market transactions, including under plans complying with Rule 10b5-1 under the Exchange Act.    (2)  In May 2015, the Company entered into a new accelerated share repurchase arrangement (\u0093ASR\u0094) to purchase up to $6.0 billion of the Company\u0092scommon stock. In exchange for up-front payments totaling $6.0 billion, the financial institutions committed to deliver shares during the ASR\u0092s purchase period, which will end in or before November 2015. The total number of shares ultimatelydelivered, and therefore the average price paid per share, will be determined at the end of the applicable purchase period based on the volume weighted-average price of the Company\u0092s common stock during that period. During the third quarter of2015, 38.3 million net shares were delivered and retired under the May 2015 ASR, and the final number of shares to be delivered will be determined at the conclusion of the purchase period.   Item 3."

5. Running Time

The running time for parsing one 10-Q filing is:

## Time difference of 0.3136666 secs

Trash Can