Nate E TeBlunthuis
							
						 
					 | 
					
						
						
						
						
							
						
						
							4eb82d2740
							
						
					 | 
					
						
						
							
							Fix whitespace at top of file.
						
						
						
						
						
					 | 
					
						2020-07-05 23:32:00 -07:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Nate E TeBlunthuis
							
						 
					 | 
					
						
						
						
						
							
						
						
							34185337c9
							
						
					 | 
					
						
						
							
							Secondary sort for the by_author dataset should be CreatedAt.
						
						
						
						
						
					 | 
					
						2020-07-05 23:29:35 -07:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Nate E TeBlunthuis
							
						 
					 | 
					
						
						
						
						
							
						
						
							67857a3b05
							
						
					 | 
					
						
						
							
							Create a second dataset sorted by author.
						
						
						
						
						
					 | 
					
						2020-07-05 23:27:05 -07:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Nate E TeBlunthuis
							
						 
					 | 
					
						
						
						
						
							
						
						
							6d4344355b
							
						
					 | 
					
						
						
							
							Create parquet datasets of reddit submissions from pushshift.
						
						
						
						
						
					 | 
					
						2020-07-05 23:20:17 -07:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Nate E TeBlunthuis
							
						 
					 | 
					
						
						
						
						
							
						
						
							6dca79a41f
							
						
					 | 
					
						
						
							
							Rename spark script to reflect that it is for comments.
						
						
						
						
						
					 | 
					
						2020-07-03 14:00:36 -07:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Nate E TeBlunthuis
							
						 
					 | 
					
						
						
						
						
							
						
						
							94c7a74bd9
							
						
					 | 
					
						
						
							
							update .gitignore
						
						
						
						
						
					 | 
					
						2020-07-03 13:55:25 -07:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Nate E TeBlunthuis
							
						 
					 | 
					
						
						
						
						
							
						
						
							4dd9a210e6
							
						
					 | 
					
						
						
							
							bugfix in retrieving old data and rename file.
						
						
						
						
						
					 | 
					
						2020-07-03 13:54:55 -07:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Nate E TeBlunthuis
							
						 
					 | 
					
						
						
						
						
							
						
						
							c972d828b3
							
						
					 | 
					
						
						
							
							Script for checking shas for submissions.
						
						
						
						
						
					 | 
					
						2020-07-03 13:35:46 -07:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Nate E TeBlunthuis
							
						 
					 | 
					
						
						
						
						
							
						
						
							7da18e33ba
							
						
					 | 
					
						
						
							
							Bugfix: use timestamp types
						
						
						
						
						
						
						
						Also change the canonical file path. 
						
					 | 
					
						2020-07-03 11:38:43 -07:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Nate E TeBlunthuis
							
						 
					 | 
					
						
						
						
						
							
						
						
							db2d6248fc
							
						
					 | 
					
						
						
							
							update the reddit comment dumps
						
						
						
						
						
					 | 
					
						2020-07-03 10:41:13 -07:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Nate E TeBlunthuis
							
						 
					 | 
					
						
						
						
						
							
						
						
							d05da6441f
							
						
					 | 
					
						
						
							
							Don't clobber old dumps so that we can just download the new ones.
						
						
						
						
						
					 | 
					
						2020-07-03 10:40:43 -07:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Nate E TeBlunthuis
							
						 
					 | 
					
						
						
						
						
							
						
						
							592d2c7dda
							
						
					 | 
					
						
						
							
							script for getting submissions dumps from pushshift.
						
						
						
						
						
					 | 
					
						2020-07-02 17:40:17 -07:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Nate E TeBlunthuis
							
						 
					 | 
					
						
						
						
						
							
						
						
							64e9408a65
							
						
					 | 
					
						
						
							
							Extract variables from pushshift comment to parquet
						
						
						
						
						
						
						
						A spark script 
						
					 | 
					
						2020-07-02 14:35:55 -07:00 | 
					
					
						
						
							
							
							
						
					 |