| 
							
							
								 Nate E TeBlunthuis | 6d4344355b | Create parquet datasets of reddit submissions from pushshift. | 2020-07-05 23:20:17 -07:00 |  | 
			
				
					| 
							
							
								 Nate E TeBlunthuis | 6dca79a41f | Rename spark script to reflect that it is for comments. | 2020-07-03 14:00:36 -07:00 |  | 
			
				
					| 
							
							
								 Nate E TeBlunthuis | 94c7a74bd9 | update .gitignore | 2020-07-03 13:55:25 -07:00 |  | 
			
				
					| 
							
							
								 Nate E TeBlunthuis | 4dd9a210e6 | bugfix in retrieving old data and rename file. | 2020-07-03 13:54:55 -07:00 |  | 
			
				
					| 
							
							
								 Nate E TeBlunthuis | c972d828b3 | Script for checking shas for submissions. | 2020-07-03 13:35:46 -07:00 |  | 
			
				
					| 
							
							
								 Nate E TeBlunthuis | 7da18e33ba | Bugfix: use timestamp types Also change the canonical file path. | 2020-07-03 11:38:43 -07:00 |  | 
			
				
					| 
							
							
								 Nate E TeBlunthuis | db2d6248fc | update the reddit comment dumps | 2020-07-03 10:41:13 -07:00 |  | 
			
				
					| 
							
							
								 Nate E TeBlunthuis | d05da6441f | Don't clobber old dumps so that we can just download the new ones. | 2020-07-03 10:40:43 -07:00 |  | 
			
				
					| 
							
							
								 Nate E TeBlunthuis | 592d2c7dda | script for getting submissions dumps from pushshift. | 2020-07-02 17:40:17 -07:00 |  | 
			
				
					| 
							
							
								 Nate E TeBlunthuis | 64e9408a65 | Extract variables from pushshift comment to parquet A spark script | 2020-07-02 14:35:55 -07:00 |  |