a883cb536b 
							
						 
					 
					
						
						
							
							add note to readme about dependency on compression software  
						
						
						
					 
					
						2018-07-04 15:20:52 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
						
						
							
						
						
							e925ac9da1 
							
						 
					 
					
						
						
							
							add tests for wikipedia, malformed xml, bzip2, correct bz2 bug in wikiq.  
						
						
						
					 
					
						2018-07-04 15:08:30 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
						
						
							
						
						
							d2746879d0 
							
						 
					 
					
						
						
							
							create baseline tests for xml dump processing  
						
						
						
					 
					
						2018-07-03 23:43:47 -07:00 
						 
				 
			
				
					
						
							
							
								Benjamin Mako Hill 
							
						 
					 
					
						
						
						
						
							
						
						
							ba886ecf4c 
							
						 
					 
					
						
						
							
							a number of small updates and fixes  
						
						... 
						
						
						
						- fix regex for filename/filetype matches
- unload all files not just ones with end with xml in 7z archives
- fix bug that broke stdout
- minor cosmetic fixes
- updated mediawiki-utilities submodule to latest version 
						
					 
					
						2018-05-17 14:37:20 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
						
						
							
						
						
							3f9da40747 
							
						 
					 
					
						
						
							
							support 7z archives with multiple files. add urlencode paraeter  
						
						
						
					 
					
						2017-12-07 15:10:56 -08:00 
						 
				 
			
				
					
						
							
							
								Benjamin Mako Hill 
							
						 
					 
					
						
						
						
						
							
						
						
							5d7dceb9e4 
							
						 
					 
					
						
						
							
							fix code to work with bzip files  
						
						
						
					 
					
						2017-02-06 18:25:17 -08:00 
						 
				 
			
				
					
						
							
							
								Benjamin Mako Hill 
							
						 
					 
					
						
						
						
						
							
						
						
							7d8ec932dd 
							
						 
					 
					
						
						
							
							added list of compressed dump files to .gitignore  
						
						
						
					 
					
						2015-07-23 12:16:31 -07:00 
						 
				 
			
				
					
						
							
							
								Benjamin Mako Hill 
							
						 
					 
					
						
						
						
						
							
						
						
							d934700ee9 
							
						 
					 
					
						
						
							
							added support to parse namespaces from title  
						
						... 
						
						
						
						This is necessary for wikis (e.g., Wikia XML dumps) that do not include
namespace metadata as tags within each <page>. 
						
					 
					
						2015-07-23 12:12:20 -07:00 
						 
				 
			
				
					
						
							
							
								Benjamin Mako Hill 
							
						 
					 
					
						
						
						
						
							
						
						
							108c8442b2 
							
						 
					 
					
						
						
							
							added README file to document the submodule  
						
						
						
					 
					
						2015-07-22 19:55:08 -07:00 
						 
				 
			
				
					
						
							
							
								Benjamin Mako Hill 
							
						 
					 
					
						
						
						
						
							
						
						
							eeb0742cc6 
							
						 
					 
					
						
						
							
							created new repository for wikiq with Mediawiki-Utilities as a submodule  
						
						
						
					 
					
						2015-07-22 19:44:52 -07:00